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1. INTRODUCTION 


Normalized Elo is introduced here (a). The primary motivation for normalized 
Elo is that it is a measure for the amount of games it takes to prove that one engine 
is stronger than another, with a given level of significance. In other words it is an 
objective measure of strength difference. 

In this document we make some cosmetic changes to the terminology introduced 
in loc. cit. In particular what was called “normalized Elo” will now be called 
“normalized t-value” and we redefine normalized Elo as the normalized t-value 
multiplied by an appropriate normalization constant. This is done to make the 
comparison with ordinary logistic Elo more intuitive. 


2. BACKGROUND 


The normalized t-value for the strength difference of two engines is defined as 


p—1/2 
pg 


thc 


where yu is the expected score and op, is the standard deviation of the expected 
score per game. In the trinomial case op, is the standard deviation of the outcome 
distribution of a game, scored as 0,1/2,1. In the pentanomial case, op, is the 
standard deviation of the outcome distribution multiplied by V2, where we score 
the outcome of a game pair as 0,1/4,2/4,3/4, 1. 

The justification for this convention is that, whatever testing system we use, the 
normalized t-value f, of a test is defined to be the usual t-value divided by the 
square root of the number of games. Then ft, is the asymptotic expectation value 
of tn. More precisely, asymptotically we have 


where N is the number of games. 


In the trinomial case or in the pentanomial case with a perfectly balanced book 
we have 


(2.1) Ope = 5v1 =a 


where d is the draw ratio. 
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3. NORMALIZATION 
Below we put 
7 1 
~ 14 10-#/400 
This is the function which converts ordinary (“logistic”) Elo into an expected score. 
It is convenient to write 


S(a) 


S(a) = L(6x) 
where 6 = log(10)/400 and L is the usual logistic function 
1 
L — 
(2) l+e* 


L satisfies the functional equation 


L'(ax) = L(x)(1— L(2)) 


Let 
2 800 
Con i= >= > 347.43 
/t*~ 8 ~ Jog(10) 
We claim that for small Elo differences we have 
1 El 
3.1 tn & 
8) Ce/t 20p¢ 


where e; is the logistic Elo difference between two engines. To see this note 
e, = (s—S(0))/S’(0) = (s—1/2)/(GL"(0)) = (s—1/2)/(1/2(1-1/2)B) = 4(s—1/2)/B 


Hence 
La 4 b4el2 e—1/2 |. 
Cej_2pg 28 apg Foe 


tn 


We define the normalized Elo difference between two engines as 


(3.2) €n := Cesttn 


In case of a perfectly balanced book it follows from (Rl) and (sm) that 


El 


3.3 n= 
(3.3) e i 


This simple formula is the motivation for the normalization introduced in (ee 
We see in particular that for d = 0 normalized Elo and logistic Elo coincide. For 
other draw ratios we have the following conversion table. 


Draw ratio 0.00 | 0.30 | 0.50 | 0.60 | 0.70 | 0.80 | 0.90 


Normalized Elo | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 | 5.00 
Logistic Elo 5.00 | 4.18 | 3.54 | 3.16 | 2.74 | 2.24 | 1.58 


Let us now discuss the duration of an SPRT test for HO:e, = en,9 versus H1 : 
€n = €n 1. Under the assumption that the the Type I/II error probabilities are 
given by a = 8 = 0.05 we get that the worst case expectation duration (which 
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corresponds to the actual Elo being half way between HO and H1) of the test is 
given by 


(3.4) T= 


where 


D := C2,, log(19)* = 1046535 


This leads to the following table 


| Normalized Elo difference 1 2 3 4 5 6 | 
Expected duration 1046535 | 261634 | 116282 | 65408 | 41861 | 29070 | 


Note that D is close to 1000000 which is sufficiently accurate for back of the 
envelope calculations. 


Let us derive the formula 6.4). We may equivalently consider_an SPRT of 
tn = tno versus tn = tn. Let us suppose that op, is known (see below for 
a discussion) so it is sufficient to consider an SPRT test for = so versus ps = $1, 
for suitable so, s,;. According to [6] the expected duration of such a test, when the 
actual score is fz is equal to 


we? 
where 
= — = tr _ tn,0 
pg 
h, = 2h =. (so + 81) = Qty _ (tn,o + baa) 
aa 81 — 80 7 tna — tno 
2b1—e hr 
T(h) = —————; 
(h) h ite hb 


ec 
b = log (=) 
a 


when the Type I/II error probabilities are both equal to a. 

The worst case is given when t, = (tno + tn,1)/2. In that case u = (sp + 81)/2 
and hence h,, = 0. Applying l’H6épital’s rule we find 
T(0) b? C240" 


- (tri _ tno)? 7 (tra _ tno)? (€n,1 _ €n,0)? 


If a = 0.05 then 6 = log(19) and we obtain 6.4). 


For completeness we note that in case t, = tno or tn = tn,1 the expected duration 
is given by a similar formula as ( where the numerator is replaced by 


a 


9C2,, log(19 
= ee ~ 639770 
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4. LLR COMPUTATION 


What we call an SPRT is strictly speaking a GSPRT (a which is based on 
monitoring the Generalized Log Likelihood Ratio (which we denote by LLR below) 
of H1 versus HO. See [2] for an introduction. 


4.1. The exact LLR. Assume given real numbers 
ay < ag < +++ <Q 
and a discrete probability distribution 
P:{a,...,a}7R: a4 p;. 
with mean yz and standard deviation o. Assume a sample taken from {a,,...,ay} 
according to P has sample distribution (p;);=1,....7. Let prep be some reference 
value. Put 
LL — Pref 
4.1 t = ———_,, 
(4.1) ; 
We will give a numerical procedure to compute the MLE for the (p;); given the 
empirical distribution p, subject to the constraint t = t, for a given t,. Although 
in practice this procedure appears to converge rapidly, as confirmed by simulation, 
we mention the following caveats: 
e we have not proved that the MLE is unique; 
e we have not proved convergence, rapidly or not. 


Anyway, keeping this in mind, we explain the procedure. We must maximize 


(42) > Bilog p; 
subject to 

(4.3) pa = 
(4.4) ei 


Let (m;)i>0 be the moments of P. We rewrite (me) as 


(4.5) o(P) = ™1 — Href Mo — t. \/ m2™Mo — mi = 0 


The distinguishing feature of ¢ is that it is homogeneous of degree 1 in (p;);. We 
will now assume that ¢(P) is an arbitrary expression in_(p;);, homogeneous of 
degree & # 0. To compute the extremal value(s?) of (A) ) subject to (7) and 
the condition ¢(P) = 0 we use Lagrange multipliers. That is we need to solve 


a¢/ Op; = 0 where 

= S¢ pilogp: — (9) pi — 1) — 06(P) 
where in addition (4.3) and $(P) = 0 are satisfied. Or in other words 
(4.6) Pi) 40$(P)=0,  i=1,...41 


for ¢; = 0¢/Op;. For use below we note the Euler identity 


(4.7) > Pidi(P) = K¢(P) 
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Rewriting ug) as 

Bi = pilA + O6;(P)) 
and summing over 7, using (ey @(P) = 0 and u.7) we obtain 4 = 1. Conversely 
if X = 1 then 7, p; = 1. In other words we are reduced to solving the following 
system: 


pil+O¢(P))=p t=1,...,1 
) d= pidi(P) = 0 


where we have used (i) again (here we use & # 0 to have that (eah fps 
( 


¢o(P) = 0). This suggests the following numerical procedure for solving 
Assume we have an estimate P,, for the MLE distribution. Then determine 6,41 
such that 


on as 14+ On41i(Pr) : 


a 


and put 


Di 
4.10 ae i acl 
oe cere 1+ 6n+410i(Pn) . 


Note that automatically )7; pn+41,,=,1 (excercise). In order to have 0 < pr+41,i we 
should only consider solutions of satisfying. 


—1/v = On+1 < —1/u 


where u = min; ¢;(P,), v = max; ¢;(P,,) (such solutions are unique). Furthermore 
in order for (11.9) to have a solution_we also should have uv < 0. 
In the case where ¢(P) is as in lis) then we obtain 


1 ai — pb 
o;(P) = aA Lref aad 14 ( = ) 


so we should use this expression in (ud) and (4.10). 


The question remains what we should take for Pp. An obvious choice is Po = p 
but then it sometimes happens that the condition uv < 0 is not satisfied. A much 
safer choice seems to be a uniform distribution. Le. Vi : pj = 1/l. 


4.2. An approximation. In | a relatively elegant method was given to compute 
the LLR for an SPRT for the mean_of a multinomial distribution. In fact this can 
be obtained from the approach in by taking ¢(P) =m, = 8.mo. 

In 5 an approximation for the LLR was derived and in this was compared 
to the exact one. It seems a good strategy to use this approximation if in addition 
we estimate @ (necessary to convert the mean into a t-value) from the test itself. 

Then by ish the LLR for an SPRT of HO:u = sq versus Hl:u4 = s,; may be 
approximated by 


(ay eR ae Sema) “2 (Gaareet) 
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where n is the sample size. Put 


(4.12) ga) go Se eee ala 

o or o 
Remark 4.1. In the context of the pentanomial model described above, [rep = 1/2, 
n = N/2 and the t-values in (4.19) are the normalized t-values multiplied by V2. 


In the trinomial case the t-values coincide with the normalized ones. 


Assuming that @ is a good approximation for o we find from (4.11) 
n 1 + (t = wr) 
4.13 LLR & = lo 7 
aac 2 (SES 


It is however a bit inelegant that does not reduce to the corresponding 
trinomial approximation, even with a perfectly balanced book. Taking advantage 
of the fact_that in practice t, to and t, will be small compared to 1 and combining 
Remark with the fact that for small x we have 1 + 2x = (1 +2)? we arrive at 
our final formula 


N 1+, - no) 
4.14 LLR ~ —1 : 
ie) 2 og (+ (in — tna)? 


which is valid both in the trinomial and pentanomial case. That iad) performs 
entirely satisfactorily is confirmed by simulation. See fi 


Remark 4.2. As #, to and t; will be small compared to 1, can be further 
approximated by 


(4.15) LLR& a (in tna)’ — (fx — tai)?) = Fina — tn,o)(2tn — tn,o — tn,1) 
This formula works just as well but it needs to be regularized in some way. Indeed 
at the beginning of a test (say after a few game pairs with identical outcomes) & 
will still be very small4 and hence ¢ may be spuriously large. Then the same will 
be true for duis) 

It_is easy to see that for small tp,,9, tn,1 the extremal values of the the log-factor 
in lind are = +(tn1—tno). This suggests an easy to use regularization rule which 
applies to (4.15) but also to other approximations: LLR/N should be clamped to 
the interval [—(tn1 — tn,o)/2, (tna — tn,1)/2]- 
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Nn fact 6 = 0. However for simplicity we replace zero outcome frequencies with a small « > 0 
to avoid division by zero. 


